Spectral Clustering for German Verbs
نویسندگان
چکیده
We describe and evaluate the application of a spectral clustering technique (Ng et al., 2002) to the unsupervised clustering of German verbs. Our previous work has shown that standard clustering techniques succeed in inducing Levinstyle semantic classes from verb subcategorisation information. But clustering in the very high dimensional spaces that we use is fraught with technical and conceptual difficulties. Spectral clustering performs a dimensionality reduction on the verb frame patterns, and provides a robustness and efficiency that standard clustering methods do not display in direct use. The clustering results are evaluated according to the alignment (Christianini et al., 2002) between the Gram matrix defined by the cluster output and the corresponding matrix defined by a gold standard.
منابع مشابه
Exploring Soft-Clustering for German (Particle) Verbs across Frequency Ranges
In this paper we explore the role of verb frequencies and the number of clusters in soft-clustering approaches as a tool for automatic semantic classification. Relying on a large-scale setup including 4,871 base verb types and 3,173 complex verb types, and focusing on synonymy as a taskindependent goal in semantic classification, we demonstrate that low-frequency German verbs are clustered sign...
متن کاملDetermining the Degree of Compositionality of German Particle Verbs by Clustering Approaches
This work determines the degree of compositionality of German particle verbs by two soft clustering approaches. We assume that the more compositional a particle verb is, the more often it appears in the same cluster with its base verb, after applying a probability threshold to establish cluster membership. As German particle verbs are difficult to approach automatically at the syntax-semantics ...
متن کاملInducing German Semantic Verb Classes from Purely Syntactic Subcategorisation Information
The paper describes the application of kMeans, a standard clustering technique, to the task of inducing semantic classes for German verbs. Using probability distributions over verb subcategorisation frames, we obtained an intuitively plausible clustering of 57 verbs into 14 classes. The automatic clustering was evaluated against independently motivated, handconstructed semantic verb classes. A ...
متن کاملLatent Semantic Clustering of German Verbs with Treebank Data
Treebank data have been utilized as data sources for a wide range of tasks in computational linguistics, including statistical parsing, anaphora resolution, induction of valence lexica, etc. More recently, researchers have experimented with extracting semantic information from syntactically annotated data. Here, treebank data have been used for the purposes of identifying selectional preference...
متن کاملExperiments on the automatic induction of German semantic verb classes
This article presents clustering experiments on German verbs: A statistical grammar model for German serves as the source for a distributional verb description at the lexical syntax–semantics interface, and the unsupervised clustering algorithm k-means uses the empirical verb properties to perform an automatic induction of verb classes. Various evaluation measures are applied to compare the clu...
متن کامل